PyDigger - unearthing stuff about Python


NameVersionSummarydate
trafilatura 1.9.0 Python package and command-line tool designed to gather text on the Web, includes all necessary discovery and text processing components to perform web crawling, downloads, scraping, and extraction of main texts, metadata and comments. 2024-05-02 10:17:30
courlan 1.1.0 Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters. 2024-04-30 11:20:27
htmldate 1.8.1 Fast and robust extraction of original and updated publication dates from URLs and web pages. 2024-04-11 14:50:20
simplemma 0.9.1 A simple multilingual lemmatizer for Python. 2023-01-20 17:07:40
py3langid 0.2.2 Fork of the language identification tool langid.py, featuring a modernized codebase and faster execution times. 2022-06-14 13:30:04
Adrien Barbaresi
hourdayweektotal
9823209520204648
Elapsed time: 0.58913s